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Coronaviruses have evolved diverse mechanisms to recognize 
different receptors for their cross-species transmission and host- 
range expansion. Mouse hepatitis coronavirus (MHV) uses the 
N-terminal domain (NTD) of its spike protein as its receptor¬ 
binding domain. Here we present the crystal structure of MHV 
NTD complexed with its receptor murine carcinoembryonic antigen- 
related cell adhesion molecule la (mCEACAMIa). Unexpectedly, 
MHV NTD contains a core structure that has the same 0-sandwich 
fold as human galectins (S-lectins) and additional structural motifs 
that bind to the N-terminal Ig-like domain of mCEACAMIa. Despite 
its galectin fold, MHV NTD does not bind sugars, but instead binds 
mCEACAMIa through exclusive protein-protein interactions. Crit¬ 
ical contacts at the interface have been confirmed by mutagenesis, 
providing a structural basis for viral and host specificities of coro- 
navirus/CEACAMI interactions. Sugar-binding assays reveal that 
galectin-like NTDs of some coronaviruses such as human coronavi¬ 
rus OC43 and bovine coronavirus bind sugars. Structural analysis 
and mutagenesis localize the sugar-binding site in coronavirus NTDs 
to be above the p-sandwich core. We propose that coronavirus 
NTDs originated from a host galectin and retained sugar-binding 
functions in some contemporary coronaviruses, but evolved new 
structural features in MHV for mCEACAMIa binding. 

coronavirus evolution | galectin-like N-terminal domain of coronavirus spike 
proteins | carcinoembryonic antigen-related cell adhesion molecule 1 
binding by coronaviruses | sugar binding by coronaviruses 

C oronaviruses use a variety of cellular receptors and core¬ 
ceptors, including proteins and sugars. The diverse use of 
receptors has allowed coronaviruses to infect a wide range of 
mammalian and avian species and cause respiratory, enteric, sys¬ 
temic, and neurological diseases. How coronaviruses have evolved 
to do so has been a major puzzle in virology. To solve this puzzle, 
we have investigated the structural basis for the complex receptor- 
recognition mechanisms of coronaviruses. 

The Coronaviridae family of large, enveloped, positive-stranded 
RNA viruses consists of at least three major genera or groups (Table 
SI). Aminopeptidase-N (APN) is the receptor for porcine trans¬ 
missible gastroenteritis virus (TGEV), porcine respiratory corona¬ 
virus (PRCoV), and human coronavirus 229E (HCoV-229E) from 
group 1 (1-3). Carcinoembryonic antigen-related cell adhesion 
molecule 1 (CEACAM1), a member of the carcinoembryonic anti¬ 
gen family in the Ig superfamily, is the receptor for mouse hepatitis 
coronavirus (MHV) from group 2 (subgroup 2a) (4, 5). Angiotensin¬ 
converting enzyme 2 (ACE2) is the receptor for human coronavirus 
NL63 (HCoV-NL63) from group 1 and human severe acute re- 
spiratoiy syndrome coronavirus (SARS-CoV) from group 2 (sub¬ 
group 2b) (6, 7). Sugars serve as receptors or coreceptors for TGEV 
from group 1, bovine coronavirus (BCoV) and human coronavirus 
OC43 (HCoV-OC43) from group 2a, and avian infectious bronchitis 
virus (IBV) from group 3 (8—14). Receptor is unknown for some 
coronaviruses such as group 2a human coronavirus HKU1 (HCoV- 
HKU1). The diversity in receptor use is a distinctive feature of the 
Coronaviridae family and a few other virus families such as retro¬ 
viruses and paramyxoviruses (15, 16). 

The characteristic large spikes on coronavirus envelopes are 
composed of trimers of the spike protein. The spike protein 
mediates viral entry into host cells by functioning as a class I viral 


fusion protein (17). During maturation, the spike protein is often 
cleaved into a receptor-binding subunit SI and a membrane- 
fusion subunit S2 that associate together through noncovalent 
interactions (Fig. 1A). SI sequences are relatively well conserved 
within each coronavirus group, but differ markedly between dif¬ 
ferent groups. SI contains two independent domains, N-terminal 
domain (NTD) and C domain, that can both serve as viral receptor¬ 
binding domains (RBDs) (Table SI). C domain binds to APN or 
ACE2 in coronaviruses that use them as receptors (3, 18-24), 
whereas NTD binds to CEACAM1 in MHV or sugar in TGEV (9, 
25). The sugar-binding domain has not yet been identified in the 
spike proteins of HCoV-OC43, BCoV, or IBV. The only atomic 
structures available for coronavirus SI are C domains of HCoV- 
NL63 and SARS-CoV, each complexed with their common re¬ 
ceptor ACE2 (23, 24). Despite marked differences in their struc¬ 
tures, the two C domains bind to overlapping regions on ACE2 
(23, 24). Structural information has been lacking for any corona¬ 
virus SI NTD. 

MHV, the prototypic and extensively studied coronavirus, causes 
a variety of murine diseases. Strain A59 (MHV-A59) is primarily 
hepatotropic, whereas strain JHM (MHV-JHM) is neurotropic. 
Murine CEACAM1 (mCEACAMl), whose primary physiological 
functions are to mediate cell adhesion and signaling, is the prin¬ 
cipal receptor for all MHV strains (26). mCEACAMIa is broadly 
expressed in epithelial cells, endothelial cells, and microphages, but 
its expression level is low in the central nervous system and is re¬ 
stricted to endothelial and microgolial cells (27). mCEACAMl 
is encoded by two alleles to produce mCEACAMIa and -lb; 
mCEACAMIa is a much more efficient MHV receptor than 
mCEACAMlb (28, 29). mCEACAMIa contains either two [D1 and 
D4] or four [D1-D4] Ig-like domains in tandem, a result of alter¬ 
native mRNA splicing (26). The crystal structure of mCEACAMIa 
[1,4] shows that a CC' loop (loop connecting 0-strands C and C') in 
the V-set Ig-like domain D1 encompasses key MHV-binding resi¬ 
dues including Ile41 (30). Curiously, although mammalian CEA- 
CAM1 proteins are significantly conserved (Tables S2 and S3), only 
murine CEACAMla can serve as an efficient MHV receptor (31). 
Also, although group 2a coronavirus spike proteins are significantly 
conserved (Tables S2 and S3), only MHV spike protein interacts 
with mCEACAMIa (31). The molecular determinants for the viral 
and host specificities of coronavirus/CEACAMl interactions re¬ 
main elusive. 

Despite the lack of sequence homology between coronavirus 
spike proteins and any known sugar-binding proteins (lectins), 
sugar moieties on host cell membranes such as glycoproteins, gly- 
colipids, and glycosaminoglycans play important roles in host cell 
infections by many coronaviruses (8). HCoV-OC43 and BCoV 
spike proteins recognize cell-surface components containing N- 
acetyl-9-O-acetylneuraminic acid (Neu5,9Ac2) (13, 14). These 
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Fig. 1. Structure of MHV-NTD/mCEACAMIa complex. (A) 
Domain structure of MHV spike protein. NTD: N-terminal 
domain; RBD: receptor-binding domain; HR-N: heptad- 
repeat N; HR-C: heptad-repeat C; TM: transmembrane an¬ 
chor; 1C: intracellular tail. The signal peptide corresponds to 
residues 1-14 and is cleaved during molecular maturation 
(45). Structures and functions of gray areas have not been 
clearly defined. ( B ) Kinetics and binding affinity of NTD and 
mCEACAMIa. (C) Structure of NTD/mCEACAMIa complex. 
Two [3-sheets of the NTD core are in green and magenta, 
respectively; receptor-binding motifs (RBMs) are in red; 
other parts of the NTD are in cyan; mCEACAMIa is in yel¬ 
low; and virus-binding motifs (VBMs) are in blue. N*: 
N terminus; C*: C terminus. ( D) Sequence and secondary 
structures of NTD. [3-Strands are shown as arrows, and the 
disordered region as a dashed line. 


viruses also contain a hemagglutinin-esterase (HE) that functions 
as a receptor-destroying enzyme and aids viral detachment from 
sugars on infected cells (32). MHV spike protein does not bind 
sugars (33), and the HE genes of many MHV strains are present but 
not expressed (34). TGEV spike protein recognizes A-glycolylneur- 
aminic acid (Neu5Gc) and TV-acetylneuraminic acid (Neu5Ac), and 
such sugar-binding activities are required for the enteric tropism 
of TGEV (9). PRCoV spike protein, an NTD-deletion mutant of 
TGEV spike protein, fails to bind sugars, and hence PRCoV has 
respiratory tropism only (3, 10). IBV spike protein recognizes 
Neu5Ac (11, 12). Overall, many coronavirus spike proteins can 
function as viral lectins, but the molecular nature of their lectin 
activities is a mystery. 

How did coronavirus spike proteins originate and evolve, and 
how did the evolutionary changes in their spike proteins allow 
coronaviruses to explore novel cellular receptors and expand their 
host ranges? To address these questions, we have determined the 
crystal structure of MHV-A59 NTD complexed with mCEA- 
CAMla[l, 4]. The structure has elucidated the receptor recogni¬ 
tion mechanism of MHV and identified the determinants of the 
viral and host specificities of coronavirus/CEACAMl interactions. 
Furthermore, the NTD structure has unexpectedly revealed the 
structural basis for the lectin activities of coronavirus spike proteins 
and provided structural insights into the origin and evolution of 
coronavirus spike proteins. 

Results and Discussion 

Structure Determination. To prepare an MHV-A59 NTD frag¬ 
ment suitable for crystallization, we designed a series of C- 
terminal truncation constructs of MHV-A59 NTD based on its 
secondary structure predictions. One NTD fragment containing 
residues 1-296 was well expressed in insect cells and stable in so¬ 
lution. It bound to mCEACAMla[l,4] to form a 1:1 heterodimeric 
complex with K d of 21.4 nM (Fig. 15). We crystallized this complex 
in space group P6|22, a = 76.4 A, b = 76.4 A, and c = 942.1 A 
(Table S4), with two complexes per asymmetric unit (Fig. SI). The 
structure was determined by single-wavelength anomalous diffrac¬ 
tion (SAD) phases using selenomethionine-labeled mCEACAMIa. 


The phases were subsequently improved by an averaging method 
(35). We refined the structure at 3.1 A resolution (Table S4). The 
final model contains residues 15—268 of NTD (except for a disordered 
loop from residues 40-64) and residues 1-202 of mCEACAMIa. 
The model also contains glycans AMinked to viral residue 192 and to 
receptor residues 37, 55, and 70. 

Structure of MHV-NTD/mCEACAMIa Complex. The core of MHV 
NTD is a 13-stranded p-sandwich with two antiparallel p-sheets 
stacked against each other through hydrophobic interactions (Fig. 1 
C and D and Fig. 24). Surprisingly, this p-sandwich core of MHV 
NTD has a galectin fold, which will be discussed in detail later. 
Outside the core structure, MHV NTD contains several peripheral 
structural elements (Fig. 1C and Fig. 24). Three loops from the 
“upper” p-sheet of the core converge with the N-terminal segment to 
form a distinct, receptor-contacting substructure; these four struc¬ 
tural elements are termed receptor-binding motifs (RBMs). Three 
disulfide bonds, connecting cysteines 21—158,165-246, and 153-187, 
reinforce MHV NTD (Fig. 25). 

MHV NTD binds to domain D1 of mCEACAMIa (Fig. 1C and 
Fig. 24). There is no significant structural change in mCEACAMIa 
before and after MHV binding (Fig. S2). The four RBMs of MHV 
NTD contact two virus-binding motifs (VBMs) on the CC'C" face 
of mCEACAMIa (Fig. 1C and Fig. 24). A total of 14 residues in 
NTD interact with a total of 17 residues in mCEACAMIa (Fig. 3 
A-C). The binding buries 1,500 A 2 at the interface (Fig. 25). This 
interface is intermediate between the SARS-CoV/ACE2 in¬ 
terface (1,700 A 2 ) and the HCoV-NL63/ACE2 interface (1,300 
A"), but all of the three viral receptor-binding domains bind to 
their respective protein receptors with similar affinities (23). 
Notably, none of the observed or predicted glycans is involved in 
MHV-NTD/mCEACAMIa interactions (Fig. 25), and therefore 
MHV-NTD/mCEACAMIa binding depends exclusively on pro¬ 
tein-protein interactions. 

Detailed MHV-NTD/mCEACAMIa Interactions. The interface between 
MHV NTD and mCEACAMIa is dominated by hydrophobic 
interactions with scattered polar interactions. Two hydrophobic 
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Fig. 2. Structural details of MHV-NTD/mCEACAMIa interface. (A) Another 
view of the MHV-NTD/mCEACAMIa structure, which is derived by rotating 
the one in Fig. 1C 90° clockwise along a vertical axis. Virus-binding motif 1 
(VBM1) on MFIV NTD includes strands f3C, (3C\ and f3C" and loops CC', C'C ", 
and C"D. VBM2 on MFIV NTD corresponds to loop FG. (B) Distribution of 
glycosylation sites and disulfide bonds. Glycans and glycosylated asparagines 
are in magenta, and cysteines are in yellow. The orientation of the structure 
is the same as in Fig. 1C. (C) A hydrophobic patch at the interface that is 
important for MFIV-NTD/mCEACAMIa binding. MFIV residues are in ma¬ 
genta, and mCEACAMIa residues are in green. The orientation of the 
structure is derived by rotating the structure in Fig. 1C 180° along a vertical 
axis. (D) Another hydrophobic patch at the interface that is important for 
MHV-NTD/mCEACAMIa binding. The orientation of the structure is slightly 
adjusted from the one in Fig. 1C. 


patches stand out. One centers on Ile41 from the CC loop 
of mCEACAMIa. Ile41 is surrounded by the hydrophobic side 
chains of MHV Tyrl5, Leu89, and Leu 160 and the aliphatic side 
chains of MHV Gin 159 and Arg20 (Fig. 2C). Additionally, MHV 
Arg20 forms a bifurcated hydrogen bond with the main chain 
carbonyl group of receptor Thr39 and another hydrogen bond with 
MHV Gin 159. Receptor Arg96 forms a bifurcated salt bridge with 
receptor Asp89, while stacking with MHV Arg20. The second 
critical hydrophobic patch involves multiple hydrophobic residues 
from both MHV and mCEACAMIa that include MHV residues 
Ile22 and Tyrl62 and receptor residues Val49, Met54, and Phe56 
(Fig. 2D). Additionally, MHV Asn26 forms a hydrogen bond with 
the main chain amide group of receptor Thr57. In protein-protein 
interactions, hydrophobic interactions contribute more to binding 
energy, whereas hydrophilic interactions contribute more to bind¬ 
ing specificity. The above key hydrogen bonds between NTD side 
chains and the receptor main chain help bring the adjacent hy¬ 
drophobic patches into place. These structural analyses suggest 
that the hydrophobic patches and additional polar interactions 
provide significant binding energy and specificity to MHV-NTD/ 
mCEACAMIa binding interactions. 

The importance of contact residues at the NTD/mCEACAM 1 a 
interface has been confirmed by mutagenesis data. Here we com¬ 
pared the efficiency of mCEACAMla-dependent cell entry by len- 
tiviruses pseudotyped with wild-type or mutant MHV-A59 spike 
protein (Fig. 3D). Our data, together with published data (Fig. 3£), 

10698 | www.pnas.org/cgi/doi/10.1073/pnas.l 104306108 


showed that single mCEACAMIa substitution 141G and single 
NTD substitutions I22A, Y162A, Y162Q, Y162H, but not Y162F, 
abrogated viral infectivity (28, 36), underscoring the significance 
of the two hydrophobic patches. Furthermore, single NTD sub¬ 
stitutions R20A, R20K, and N26A significantly decreased viral in¬ 
fectivity, confirming the significance of the hydrogen bonds between 
Arg20 and mCEACAMIa and between Asn26 and mCEACAMIa. 
A naturally occurring Q159L mutation in MHV NTD caused small 
viral plaques (Fig. 3 E) (37), suggesting that the hydrogen bond be¬ 
tween Glnl59 and Arg20 helps position Arg20 to interact with 
mCEACAMIa. Therefore, through evolution MHV-A59 appears to 
have optimized many of its interactions with mCEACAMIa, and 
thus substitutions in MHV NTD that disrupt these specific molec¬ 
ular interactions weaken or abrogate viral infectivity. 

Our study provides the structural basis for the viral and host 
specificities of coronavirus/CEACAMl interactions. On the basis 
of structural analyses and mutagenesis data, we determined that 
MHV NTD contains Arg20, Ile22, Asn26, and Tyrl62, all of which 
form energetically favorable interactions with mCEACAM 1 a, whereas 
other group 2a coronavirus NTDs contain residues at the corre¬ 
sponding positions that are expected to disrupt critical hydrophobic 
or polar interactions with mCEACAMIa (Fig. 3 B and Fig. S3). On 
the other hand, on the basis of structural analyses, we found that 
mCEACAMIa contains Ile41, Val49, Met54, and Phe56, all of 
which form energetically favorable interactions with MHV, 
whereas mCEACAM lb, bovine CEACAMla and -lb, and human 
CEACAM1 contain residues at the corresponding positions that 
likely disrupt these favorable interactions with MHV (Fig. 3C and 
Fig. S3). For example, hydrophobic residues Ile41 and Phe56 in 
mCEACAMIa become hydrophilic residues Thr41 and Thr56 in 
mCEACAMlb (Fig. 3C and Fig. S3). These results reveal the 
mechanisms whereby MHV uses only mCEACAMIa, and not 
mCEACAMlb or CEACAM1 from cattle or humans, as its re¬ 
ceptor, and whereby other group 2a coronaviruses cannot use 
mCEACAMIa as a receptor. 

Sugar Binding by Coronavirus NTDs. The p-sandwich core of MHV 
NTD shares the same 11-stranded fold as human galectins (S- 
lectins) and rotavirus VP4 (viral lectin) (38, 39), augmented by 
two additional p-strands in the “lower” p-sheet (Fig. 4 and Fig. S4). 
MHV NTD and human galectin-3 have a Dali Z-score of 7.8 and 
an rmsd value of 2.9 A over 137 matching Cot atoms (40). Impor¬ 
tantly, the topologies of their p-sandwich cores are identical (Fig. 
S4). This unexpected structural homology between MHV NTD and 
human galectins suggests that coronavirus NTDs may function as 
viral lectins. To test this possibility, we designed NTD constructs of 
other group 2 coronaviruses that correspond to the crystallized 
MHV-NTD fragment based on the sequence alignment of these 
spike proteins. We expressed and purified each of these corona¬ 
virus NTDs and detected their binding interactions with mucin, 
a mixture of highly glycosylated proteins containing all of the sugar 
moieties (Neu5,9Ac2, Neu5Gc, and Neu5Ac) recognized by the 
coronavirus spike proteins. Results showed that NTDs of HCoV- 
OC43 and BCoV bound sugars, whereas NTDs of MHV-A59, 
HCoV-HKUl, and SARS-CoV did not (Fig. 5). Removal of neu¬ 
raminic acids from mucin by neuraminidase treatment prevented 
binding of HCoV-OC43 or BCoV NTD. 

Why do the NTDs of HCoV-OC43 and BCoV, but not that of 
MHV, bind sugars, and where is the sugar-binding site located in 
coronavirus NTDs? In human galectin-3 that binds galactose, the 
sugar-binding site (site A) is located above the p-sandwich core and 
involves the 10—11 loop (loop connecting pit) and pll) (Fig. 4 B 
and E) (39). In rotavirus VP4, which binds sialic acids, site A is 
blocked by a two-stranded p-sheet; instead, the sugar-binding site 
(site B) is located in a groove between the two p-sheets of the 
p-sandwich core (Fig. 4 C and F) (38). In MHV NTD, site B is 
blocked due to the narrowed groove between the two p-sheets of 
the p-sandwich core, whereas site A is open and available (Fig. 4 A 
and D). However, compared with human galectin-3, MHV NTD 
has a markedly shortened 10—11 loop that may be responsible for 
its lack of lectin activity. HCoV-OC43 and BCoV NTDs likely 
share the same galectin fold as MHV NTD due to their high se¬ 
quence similarities (Fig. SI), but they both contain longer 10-11 
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Fig. 3. Sequence analysis and mutagenesis studies of 
coronavirus/CEACAMI interactions. (A) List of contact 
residues at the interface. (8) Partial sequence alignment 
of group 2a coronavirus NTDs. Contact residues are in red, 
important noncontact residues are in blue, and loop 10- 
11 is in green. Asterisks indicate positions that have fully 
conserved residues. Colons indicate positions that have 
strongly conserved residues. Periods indicate positions 
that have weakly conserved residues. (C) Partial sequence 
alignment of mammalian CEACAM1 proteins. (0) Struc¬ 
ture-guided mutagenesis data on MHV NTD. Measured 
was mCEACAMIa-dependent cell entry by lentiviruses 
pseudotyped with wild-type or mutant MHV-A59 spike 
proteins. SEs are shown. ( E ) Published mutagenesis data 
on MHV NTD (28, 36, 37). 


loops than MHV NTD (Fig. 3 B and Fig. S3) and thus may use site 
A for sugar binding. To test this hypothesis, we modified the 10-11 
loops in both BCoV and HCoV-OC43 NTDs, using MHV NTD as 
a reference (Fig. 3 B and Fig. S3). For both BCoV and HCoV- 
OC43 NTDs, the mutant and wild-type proteins were equally well 
expressed and stable in solution, but the mutant proteins (OC43* 
and BCoV*) lacked sugar-binding activities (Fig. 5). These obser¬ 
vations confirm that the 10—11 loops are critical for sugar binding in 
both BCoV and HCoV-OC43 NTDs. A more refined description of 
the sugar-binding site in BCoV and HCoV-OC43 NTDs awaits 
future structural and biochemical studies. 

Coronavirus Receptor Use and Evolution. To date, three crystal 
structures are available for RBDs of coronavirus SI: group 2a 
MHV NTD, group 2b SARS-CoV C domain (24), and group 1 


HCoV-NL63 C domain (23) (Fig. 6). Because of the significant 
sequence similarities of the SI subunits of the spike proteins within 
each coronavirus group, the sLx-stranded p-sandwich core structure 
of the HCoV-NL63 C domain likely exists in other group 1 coro- 
naviruses (23), and the 5-stranded p-sheet core structure of the 
SARS-CoV C domain likely exists in other group 2 coronaviruses 
(24). Similarly, the galectin-like NTD of MHV likely exists in other 
group 2 coronaviruses. The folds of group 1 and group 3 corona¬ 
virus NTDs are less clear. However, because both TGEV NTD and 
IBV SI have lectin activities, the galectin-fold core structure of 
group 2a coronavirus NTDs may also be found in both group 1 and 
group 3 coronaviruses in similar or variant forms. The present 
study advances our understanding of the structures and functions 
of coronavirus spike proteins and the complex receptor-recognition 
mechanisms of coronaviruses. 



Fig. 4. Structural comparisons of MHV NTD, human galectins, 
and rotavirus VP4. (A) MHV NTD. The orientation of the 
structure is the same as in Fig. 1C. (8) Human galectin-3 
[Protein Data Bank (PDB) 1A3K]. The p-sandwich core is la¬ 
beled and colored the same as in MHV NTD. Bound galactose 
is in yellow. (C) Rotavirus VP4 (PDB 1KQR). Bound sialic acid is 
in yellow. (D) Another view of MHV SI NTD, which is derived 
by rotating the structure in A counterclockwise along a verti¬ 
cal axis. Arrow indicates loop 10-11. ( E ) Another view of hu¬ 
man galectin-3. Site A indicates its galactose-binding site. ( F) 
Another view of rotavirus VP4. Site B indicates its sialic-acid- 
binding site. 
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Fig. 5. Sugar-binding assays of group 2 coronavirus NTDs. (A) Dot-blot 
overlay assay. Measured were the binding interactions between coronavirus 
NTDs and sugar moieties on mucin-spotted nitrocellulose membranes. The 
membranes were either mock-treated or treated with neuraminidase (Nase) 
beforehand. Sugar-binding NTDs were detected using antibodies against 
their C-terminal His tags. ( B ) ELISA in which mucin-coated plates were used 
instead of nitrocellulose membranes. Sugar-binding NTDs were detected 
using ELISA substrates, and absorbance of the resulting yellow color was 
read at 450 nm. SEs are shown. BCoV* and OC43*: BCoV and HCoV-OC43 
NTDs whose 10-11 loops have been replaced by that of MHV NTD. 


How did coronavirus spike NTDs originate and evolve? We 
propose that an ancestral coronavirus acquired a galectin-like do¬ 
main from its host. Subsequently, an ancestral group 2a coronavi¬ 
rus incorporated a HE gene into its genome to aid viral detachment 
from sugars on infected cells. Later, the galectin-like NTD of MHV 
evolved additional novel structural elements that allowed it to bind 
mCEACAMla. Using a protein receptor instead of sugar receptors 
greatly enhanced the attachment affinity between MHV and mu¬ 
rine cells, making sugar-binding functions dispensable. Accord¬ 
ingly, MHV-A59 underwent changes in the sugar-binding site of its 
NTD, lost its sugar-binding activity, and stopped expressing its HE 
gene. In contrast, the galectin-like NTDs of some contemporary 

coronaviruses such as HCoV-OC43, BCoV, and TGEV retain the 
lectin activity, although their sugar specificities have diverged in 
three coronavirus groups and differ from those of contemporary 
human galectins. Some TGEV strains deleted their NTD to be¬ 
come PRCoV after their C domain acquired APN-binding affinity. 
In addition to coronaviruses, there is evidence that paramyxoviruses 
may also have acquired their RBD from a host (although with a fi- 
propellerfold) and used it to bind sugar or protein receptors (41,42). 
It seems that viruses use a common evolutionary strategy by ac¬ 
quiring host proteins and evolving them into viral RBDswith different 
receptor specificity. This strategy allows viruses to explore novel cellular 
receptors and expand their host ranges. The current study provides 
critical structural information that illustrates how coronaviruses have 
successfully used this strategy. 
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Fig. 6. Structures, functions, and evolution of SI subunits of coronavirus 
spike proteins. The three known crystal structures are indicated by "Struc¬ 
ture." Among these structures, MHV NTD has a 13-stranded galectin-like 
p-sandwich fold, HCoV-NL63 C domain has a six-stranded p-sandwich fold, 
and the SARS-CoV C domain has a five-stranded p-sheet fold. 


CaCI2. Crystals were harvested in 2 wk; stabilized in 10% PEG6000, 0.2 M 
CaCl2, and 30% ethylene glycol; and flash-frozen in liquid nitrogen. 

Selenomethionine-labeled mCEACAMla was expressed in Sf9 cells as 
previously described (43). Briefly, 24 h postinfection, cells were transferred 
into medium without methionine for methionine depletion. After 4 h, cells 
were transferred into medium without methionine but supplemented with 
50 mg/mL selenomethionine for selenomethionine labeling. After 36 h, cells 
were harvested and selenomethionine-labeled protein was purified using 
the same procedure as above. 

Structure Determination and Refinement. X-ray data were collected at Ad¬ 
vanced Photon Source Northeastern Collaborative Access Team beamlines. 
The crystal contains two complexes per asymmetric unit. From a selenome¬ 
thionine-labeled crystal, 12 selenomethionine sites in mCEACAMla were 
identified. SAD phases were then calculated. The phases were subsequently 
improved by twofold noncrystallographic symmetry (NCS) averaging within 
the crystal and cross-crystal averaging with the mCEACAMla crystal (35). 
During both density modification and structure refinement, the twofold NCS 
restraint was applied to each of the three domains—NTD and domains D1 
and D4 of mCEACAMla—between the two complexes in each asymmetric 
unit. This is because domains D1 and D4 undergo a hinge movement relative 
to each other in the two complexes due to the flexibility of the domain linker. 
The structure was refined to a final frf ree of 30.8% and fl work of 24.8%. Data, 
phasing, and refinement statistics are shown in Table S4. Software used for data 
processing, structure determination, and refinement is also listed in Table S4. 

Kinetics and Binding Affinity of MHV SI NTD and mCEACAMla by Surface 
Plasmon Resonance Using Biacore. The kinetics and binding affinity of MHV 51 
NTD and mCEACAMla were measured by surface plasmon resonance using 
a Biacore 3000. The surface of a C5 sensor chip was first activated with N- 
hydroxysuccinimide, MHV SI NTD was then injected and immobilized to the 
surface of the chip, and the remaining activated surface of the chip was 
blocked with ethanolamine. Soluble mCEACAMla was introduced at a flow 
rate of 20 pLimin at different concentrations. Kinetic parameters were de¬ 
termined using BIA-EVALUATIONS software. 


Materials and Methods 

Protein Purification and Crystallization. Both MHV-A59 SI NTD (residues 1-296) 
and mCEACAM1a[1,4] (residues 1-202) were expressed in Sf9 insect cells using 
the bac-to-bac system (Invitrogen) and then purified as previously described 
(24). Briefly, the proteins containing C-terminal His tags were harvested from 
Sf9 cell supernatants, loaded onto a nickel-nitrilotriacetic acid (Nt-NTA) col¬ 
umn, eluted from the Ni-NTA column with imidazole, and further purified 
by gel filtration chromatography on Superdex 200 (GE Healthcare). To purify 
the NTD/mCEACAMIa complex, NTD was incubated with excess mCEACAMla 
before the complex was purified by gel filtration chromatography and con¬ 
centrated to 10 mg/mL. Crystals of the NTD/mCEACAM1 a complex were 
grown in sitting drops at 4 °C over wells containing 10% PEG6000 and 0.2 M 


mCEACAMIa-Dependent Cell Entry of Lentiviruses Pseudotyped with MHV-A59 
Spike Protein. Lentiviruses pseudotyped with MHV-A59 spike protein were 
produced as previously described (44). Briefly, plasmid encoding wild-type or 
mutant MHV-A59 spike protein was cotransfected into 293T cells with helper 
plasmid psPAX2 and reporter plasmid pLenti-GFP at molar ratio 1:1:1 using 
polyethyleneimine (Polysciences Inc.). Forty-eight hours posttransfection, the 
resulting pseudotyped viruses were harvested and inoculated onto the 293 
cells expressing mCEACAMla. After overnight, cells were washed twice with 
PBS and lifted by 100 j.iL of 0.25% trypsin and 0.38 mg/mL EDTA. After being 
washed twice with PBS, cells were fixed with 1% paraformaldehyde and an¬ 
alyzed for GFP expression by flow cytometry. All experiments were repeated 
at least three times. The expression levels of spike proteins in pseudotyped 
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viruses were measured by Western blotting using polyclonal goat antibody to 
MHV-A59 spike protein (Fig. S5). 

Sugar-Binding Assays of Coronavirus SI NTDs. Sugar-binding assays of coro- 
navirus SI NTDs were performed as previously described (13, 33). Briefly, for 
the dot-blot overlay assay, 10 pg of bovine submaxillary gland mucin (BSM) 
(Sigma-Aldrich) was spotted onto nitrocellulose membranes. The mem¬ 
branes were dried completely, blocked with BSA at 4 °C overnight, and ei¬ 
ther mock-treated or treated with 20 mU/mL Arthrobacter ureafaciens 
neuraminidase (Roche Applied Science) at 25 °C for 2 h. The membranes 
were then incubated with 1 pM coronavirus SI NTDs containing a C-terminal 
His tag at 4 °C for 2 h, washed five times with PBS, incubated with anti-His 
antibody (Invitrogen) at 4 °C for 2 h, washed five times with PBS again, in¬ 
cubated with HRP-conjugated goat anti-mouse IgG antibody (1:5,000) at 4 °C 
for 2 h, and washed five times with PBS. Finally, the bound proteins were 
detected using a chemiluminescence reagent (ECL plus, GE Healthcare). For 
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